Today’s lab will guide you through the process of conducting a One-Sample t-test and an Independent Samples t-test. For each test, we will first go through the process of conducting the tests using the arithmetic and probability distributions in R. Then we will compare and contrast the functions in the {stats} and {lsr} packages.
To quickly navigate to the desired section, click one of the following links:
Today we will be analyzing data from Fox and Guyer’s (1978) anonymity and cooperation study. The data is included in the {carData} package and you can see information about the dataset using ?Guyer. Groups of four participants (Groups = 20; \(N_{TOTAL}\) = 80) each played 30 trials of the the prisoner’s dillema game. The number of cooperative choices (cooperation) were counted out of 120 (i.e., cooperative choices made by 4 participants over 30 trials). The groups either made decisions publically or privately (condition) and groups were either comprised of all women or all men (sex).
Run the following code to load the data into your global environment.
# load data
data <- Guyer
A One-Sample t-test tests whether some obtained sample mean is significantly different from a value specified by a researcher. Looking at the Guyer data, we might hypothesize that groups would cooperate more than 50% of the time (i.e., groups would cooperate on more than 60 trials). As such, the alternative hypothesis (\(H_{1}\)) would be that the population mean of cooperation is not equal to 60 (\(\mu \neq 60\)), whereas the null hypothesis (\(H_{0}\)) would be that the population mean of cooperation is equal to 60 (\(\mu = 60\)).
You might have noticed that our null hypothesis was that the mean cooperation was not equal to 60 (\(\mu \neq 60\)), rather than mean cooperation being greater than 60 (\(\mu > 60\)). This is because we plan to run a two-sided t-test. This means that we will be able to declare significance if the mean cooperation is sufficiently greater than 60 and if the mean cooperation is sufficiently less than 60. Two-sided t-tests are far more common than one-sided t-tests.
All statistical tests are essentially looking at a ratio of signal to noise. The One-sample t-test is no different.
Let’s look at the equation:
\[\frac{\bar X - \mu}{\hat \sigma / \sqrt{N}}\]
The numerator (\(\bar x - \mu\)) is the “signal”, or the difference between our sample mean and the value we are comparing it against, and the denominator (\(\hat \sigma / \sqrt{N}\)) is the “noise”, the inaccuracy (standard error) of our acquired mean.
To make our lives easier, we will first calculate some descriptive statistics: (1) the mean of cooperation in our sample (coop_mean), (2) the standard deviation of cooperation in our sample (coop_sd), and (3) the number of unique values in cooperation (coop_n). We should also calculate the degrees of freedom for our data (coop_df). In a One-Sample t-test, the degrees of freedom is equal to \(N - 1\).
# calculate descriptives
coop_mean <- mean(data$coop)
coop_sd <- sd(data$coop)
coop_n <- length(data$coop)
# calculate degrees of freedoom
coop_df <- coop_n - 1
# look at the values
c("mean" = coop_mean,
"sd" = coop_sd,
"n" = coop_n,
"df" = coop_df)
## mean sd n df
## 48.30000 14.28691 20.00000 19.00000
As we can now see, the mean value of cooperation is 48.30.
It looks like the average cooperation was actually less than 60. But this could simply be due to chance. Let’s calculate how probable obtaining a result at least as extreme as we did here is when the null hypothesis is, in fact, correct.
To do so, we will first calculate the t-statistic, which is as easy as inserting our acquired descriptive statistics into the equation above.
\[\frac{48.3 - 60}{14.29 / \sqrt{20}} = 15.12\]
Of course, we can also calculate this in R using the following code:
# calculate the t-statistic
coop_t <- (coop_mean - 60) / (coop_sd / sqrt(coop_n))
# print the t-statistic
c("t_stat" = coop_t)
## t_stat
## -3.662373
The t-statistic (-3.662473) is negative because our acquired mean was less than the mean is was tested against. If the acquired mean was greater than value it was tested against, it would be positive. It is completely fine to have a negative t-statistic and you treat it exactly the same as a positive t-statistic.
Next, we can use the pt() function to calculate the probability of getting a t-statistic of -3.662473 or lesser from a t-distribution with 9 degrees of freedom when the null hypothesis is true. The pt() function takes three arguments: (1) q (the t-statistic), (2) df (the degrees of freedom), and (3) lower.tail (i.e., whether we want to find the cumulative probability for the upper or lower part of the distribution). To calculate a two-tailed significance test using our t-statistic, we would run the following code:
coop_p <- pt(q = abs(coop_t), df = coop_df, lower.tail = FALSE) * 2
We get a p-value of .001655805. In other words, there was an approximately .17% chance of getting a t-value equal to our lesser than the t-value we got when the null hypothesis is true.
The result is multipled by 2 above because we running a two-sided t-test and wanted to be able to claim significance irrespective of whether the t-statistic was above or below the 50% value of 60.
We could have run a one-sided t-test, but, as shown in the image below, we would not have been able to conclude whether our result was significant or not.
I would also note that we took the absolute value (abs()) of the t-statistic so that we could use lower.tail = FALSE. If we had provided a negative t-statistic, we would have had to use lower.tail = TRUE.
Let’s also calculate an effect size for our statistic. Cohen’s D is a very popular measure of effect size for the t-test and it tells you the size of your effect (the difference between the acquired mean and the value specified) in standardized units. To calculate, we just divide the difference by the standard deviation.
# calculate Cohen's D
coop_d <- (coop_mean - 60) / coop_sd
# print Cohen's D
coop_d
## [1] -0.8189315
Typically, we present a Cohen’s D value as a positive number. Cohen’s D being positive or negative simply reflects whether the acquired mean was greater than the specified mean or lesser than the specified mean.
The thresholds for a small, medium, and large effect size are shown below.
| d.value | interpretation |
|---|---|
| 0.2 | small |
| 0.5 | medium |
| 0.8 | large |
Interpreting our Cohen’s D value using the threshold from the table, we might say that there was a large difference (-0.82) between our acquired mean (48.30) and the mean we were comparing it against (60.00).
Finally, we can also che 95% confidence interval for our sample mean by running the following code:
# calculate 95% confidence interval
coop_low <- coop_mean + ((coop_sd / sqrt(coop_n)) * qt(.025, df = coop_df))
coop_up <- coop_mean + ((coop_sd / sqrt(coop_n)) * qt(.975, df = coop_df))
# print 95% confidence interval
c("95% CI Lower" = coop_low,
"95% CI Upper" = coop_up)
## 95% CI Lower 95% CI Upper
## 41.61352 54.98648
The code may seem like a bit of a mess, but we are just taking our acquired mean (coop_mean) and adding to it the standard error of the mean ((coop_sd / sqrt(coop_n)) timesed by the t-statistic or lesser that corresponds to a probability of .025 and .975 (qt(.025, df = coop_df) and qt(.975, df = coop_df))
The image above showed our acquired mean with a 95% confidence interval.
There are two useful functions for conducting a One-Sample t-test in R. The first, is called t.test() and it is automatically loaded as part of the {stats} package when you first open R. To run a One-Sample t-test using t.test(), you provide the function the column of the data you are interested in (e.g., x = data$cooperation) and the mean value you want to compare the data against (e.g., mu = 60).
t.test(x = data$cooperation, mu = 60)
##
## One Sample t-test
##
## data: data$cooperation
## t = -3.6624, df = 19, p-value = 0.001656
## alternative hypothesis: true mean is not equal to 60
## 95 percent confidence interval:
## 41.61352 54.98648
## sample estimates:
## mean of x
## 48.3
As part of the output, you are provided the mean of cooperation, the t-statistic, the degrees of freedom, the p-value, and the 95% confidence interval. Unfortunately, we did not get a measure of the effect size.
The oneSampleTTest() function from the the {lsr} package includes Cohen’s d automatically, but you have to load the package separately.
oneSampleTTest(x = data$cooperation, mu = 60)
##
## One sample t-test
##
## Data variable: data$cooperation
##
## Descriptive statistics:
## cooperation
## mean 48.300
## std dev. 14.287
##
## Hypotheses:
## null: population mean equals 60
## alternative: population mean not equal to 60
##
## Test results:
## t-statistic: -3.662
## degrees of freedom: 19
## p-value: 0.002
##
## Other information:
## two-sided 95% confidence interval: [41.614, 54.986]
## estimated effect size (Cohen's d): 0.819
As you can see from the output, it provides you all of the information that t.test() did, but it also includes Cohen’s d.
An example of how to write-up a one-sample t-test is included below.
“The mean cooperation score of 48.30 (95% CI [41.61, 54.97]) was substantially less than 60, t(19) = 3.66, p = .002.”
Okay. But what if we wanted to compare the mean level of cooperation when decisions about cooperation were made publically versus the mean level of cooperation when decisions about cooperation were made anonymousely. In that case, we would use an independant samples t-test, which compares the means of well, two independent samples. For instance, I might suspect that the mean cooperation will be greater when decisions about cooperation are made publically rather than anonymously. The alternative hypotehsis (\(H_{1}\)) would be that the mean cooperation in the public group is not equal to the mean cooperation in the anonymous group. The null hypothesis (\(H_{0}\)) would be that the means of the two groups are equal.
To calculate an independent samples t-test by hand, we will want to start by splitting our data frame into two data frames according to whether the groups made decisions anonymously or publically.
data_public <- data %>%
filter(condition == "public")
data_anonymous <- data %>%
filter(condition == "anonymous")
Then, as we did for the One Sample t-test, we want to calculate descriptive statistics for both of the conditions.
# calculate group1 values
group1_mean <- mean(data_public$cooperation)
group1_sd <- sd(data_public$cooperation)
group1_n <- length(data_public$cooperation)
# calculate group2 values
group2_mean <- mean(data_anonymous$cooperation)
group2_sd <- sd(data_anonymous$cooperation)
group2_n <- length(data_anonymous$cooperation)
# print descriptive statistics
list("public" = c("mean" = group1_mean,
"sd" = group1_sd,
"n" = group1_n),
"anonymous" = c("mean" = group2_mean,
"sd" = group2_sd,
"n" = group2_n))
## $public
## mean sd n
## 55.70000 14.84775 10.00000
##
## $anonymous
## mean sd n
## 40.900000 9.421606 10.000000
As we can see from the statistics, the number of groups in each condition was equal. The mean (55.70) and standard deviation (14.85) of cooperation in the public condition were both larger than the mean (40.90) and standard deviation (9.42) of cooperation in the anonymous group.
To calculate the t-statistic, we will use a similar equation to that used for the one-sample t-test, but the numerator will be the difference between the condition means and for the denominator will be the standard error of the difference between the means.
\[\frac{\bar X_1 - \bar X_2}{\sqrt{\frac{\hat \sigma_1^2 }{N_1}+\frac{\hat \sigma_2^2 }{N_2}}}\]
This equation is actually the equation for conducting a Welch’s t-test. Unlike the original Student’s t-test, the Welch’s t-test does not assume equal variance or sample sizes. However, when the variances and sample sizes are equal, Welch’s t-test will produce t-statistic that are nearly indistinguishable from the t-statistic produced by the Student’s t-test. There is really no need to ever use Student’s t-test (other than cases with very small sample sizes). The functions for the independent samples t-test in {stats} and in {lsr} both use a Welch’s t-test by default.
# calculate
mean_diff <- group1_mean - group2_mean
se <- sqrt(((group1_sd^2) / group1_n) + ((group2_sd^2) / group2_n))
t_stat <- mean_diff / se
Next, we will calculate a p-value for the t-statistic. The code for calculating the p-value is exactly the same as above, but the code for calculating the degrees of freedom of the t-test is a bit more involved.
The equation for the degrees of freedom is:
\[\mbox{df} = \frac{ ({\hat{\sigma}_1}^2 / N_1 + {\hat{\sigma}_2}^2 / N_2)^2 }{ ({\hat{\sigma}_1}^2 / N_1)^2 / (N_1 -1 ) + ({\hat{\sigma}_2}^2 / N_2)^2 / (N_2 -1 ) }\]
First, we can calculate the numerator.
df_num <- ((group1_sd^2 / group1_n) + (group2_sd^2 / group2_n))^2
Then, we can calculate both sides of the denominator.
df_den_1 <- ((group1_sd^2) / group1_n)^2 / (group1_n - 1)
df_den_2 <- ((group2_sd^2) / group2_n)^2 / (group2_n - 1)
And, finally, we can calculate our degrees of freedom
# calculate degrees of freedom
df <- df_num / (df_den_1 + df_den_2)
# print out the degrees of freedom
df
## [1] 15.23659
You might notice that our degrees of freedom is not a whole number, and that is because it is an estimate using both of our conditions. In any case, now that we have our t-statistic and degrees of freedom, we can calculate a p-value using the same code as before
# calculate p-value
p_val <- pt(q = abs(t_stat), df = df, lower.tail = FALSE) * 2
# print p-value
p_val
## [1] 0.0175996
Looks like the probability of obtaining a difference in means of this size, when the null hypothesis is true, is 1.76%.
The code for calculating Cohen’s d is a bit more complex than what we used above, but, again, we are simply deriving the difference in means in standardized units.
# calculate cohen's d
d_val <- mean_diff / sqrt((group1_sd^2 + group2_sd^2) / 2)
# print cohen's d
d_val
## [1] 1.190259
Looks like the cooperation was decisions were made publically was far higher than the cooperation when decisions were made anonymously.
ci_low <- mean_diff + (se * qt(.025, df = df))
ci_up <- mean_diff + (se * qt(.975, df = df))
As with the one-sample t-test, we can use the t.test() function from the the built-in {stats} package to conduct an independent samples t-test.
t.test(data_public$cooperation, data_anonymous$cooperation)
##
## Welch Two Sample t-test
##
## data: data_public$cooperation and data_anonymous$cooperation
## t = 2.6615, df = 15.237, p-value = 0.0176
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.963498 26.636502
## sample estimates:
## mean of x mean of y
## 55.7 40.9
We can also use the independentSamplesTTest function in the {lsr} package to get the output with Cohen’s d included.
independentSamplesTTest(formula = cooperation ~ condition, data = data)
##
## Welch's independent samples t-test
##
## Outcome variable: cooperation
## Grouping variable: condition
##
## Descriptive statistics:
## anonymous public
## mean 40.900 55.700
## std dev. 9.422 14.848
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: -2.661
## degrees of freedom: 15.237
## p-value: 0.018
##
## Other information:
## two-sided 95% confidence interval: [-26.637, -2.963]
## estimated effect size (Cohen's d): 1.19
The formula argument for independentSamplesTTest may look unfamiliar to some of you. The formula argument is using formula syntax, which you will use a lot when you start analyzing data using multiple regression models. In brief, whatever comes on the left of the tilde (~) is the dependent variable (in this case, cooperation) and whatever comes on the right of the tilde is the independent variable (in this case, condition).
A proper write-up for our Independent Sample t-test would be:
"Cooperation in the public condition (M = 55.70, SD = 14.84) was much greater than cooperation in the anonymous condition (M = 40.90, SD = 9.42), t(15.24) = 2.66, p = .018, 95% CI [-2.96, 26.64].
You are welcome to work with a partner or in a small group of 2-3 people. Please feel free to ask the lab leader any questions you might have!
My advisor told me that I had multiple errors in my code. She told me the line numbers where the errors are, but she thought it would be a good learning experience for me to try to solve them myself. I need your help. Fix the errors in the following chunks of code.
Error 1:
I am trying to calculate the 95% confidence interval for my mean. I keep getting an error that says qnorm() is not working.
ci_lower <- 20 + ((5 / sqrt(120)) * qnorm(.025, 19, -.5))
ci_upper <- 20 + ((5 / sqrt(120)) * qnorm(.975, 19, -.5))
Error 2:
I am trying to calculate the p-value for a t-value of 3.10, but I keep getting p-value that is greater than 1 (which I think is highly unlikely).
p_val <- pt(q = 3.10, df = 19, lower.tail = TRUE) * 2
Error 3:
I prefer t.test() over independentSamplesTTest() because I’m an R purist, but I wanted to calculate a Cohen’s d value using the t2d() function in the {psych} package. I don’t think it is working properly. I have a t-statistic of 5.12and the sample size for my two groups is 15 and 22.
# load psych
library(psych)
# calculate cohen's d
t2d(t = 5.12, n = 15, n1 = 22)
You are reviewing a manuscript that claims people who were assigned to use a light editor theme in R Studio wrote better code than people who were assigned to use a dark editor theme in R Studio. Better code was operationalized as performance on a coding test (scored out of 25). The researchers state that the difference is significant at p = .043 (two-tailed), but you have your doubts. The researchers are unwilling to share the data with you, but the following values were in the manuscript.
\(\hat{\mu}_{DarkTheme}\) = 16.56
\(\hat{\mu}_{LightTheme}\) = 20.01
\(\hat{\sigma}_{DarkTheme}\) = 5.44
\(\hat{\sigma}_{LightTheme}\) = 3.12
\(n_{DarkTheme}\) = 10
\(n_{LightTheme}\) = 21
mean_diff <- 20.01 - 16.56
se <- sqrt(((3.12^2) / 21) + ((5.44^2) / 10))
t_stat <- mean_diff / se
# calculate degrees of freedom
df_num <- ((3.12^2 / 21) + (5.44^2 / 10))^2
df_den_1 <- ((3.12^2) / 21)^2 / (21 - 1)
df_den_2 <- ((5.44^2) / 10)^2 / (10 - 1)
df <- df_num / (df_den_1 + df_den_2)
# calculate p-value
pt(q = abs(t_stat), df = df, lower.tail = FALSE) * 2
## [1] 0.08703864
# calculate one-taled t-tst
pt(q = abs(t_stat), df = df, lower.tail = FALSE)
## [1] 0.04351932
The same researchers of the editor theme manuscript submit another paper on the difference in coding abilities between Mac and PC users. Again, coding ability was operationalized as performance on a coding test scored out of 25. Despite the fear that you are becoming Reviewer Two, you acquire there data through GitHub to check their analyses.
Run the following lines of code to load the data:
# set seed for reproducability
set.seed(42)
# load data
data_os <- data.frame("id" = 1:1e5,
"os" = c(rep("pc", 5e4),
rep("mac", 5e4)),
"ability" = c(rnorm(5e4, 15, 3),
rnorm(5e4, 14.97, 3)))
independentSamplesTTest(ability ~ os, data = data_os)
##
## Welch's independent samples t-test
##
## Outcome variable: ability
## Grouping variable: os
##
## Descriptive statistics:
## mac pc
## mean 14.954 14.992
## std dev. 3.004 3.016
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: -2.006
## degrees of freedom: 99996.33
## p-value: 0.045
##
## Other information:
## two-sided 95% confidence interval: [-0.075, -0.001]
## estimated effect size (Cohen's d): 0.013
independentSamplesTTest(ability ~ os, data = data_os, var.equal = TRUE)
##
## Student's independent samples t-test
##
## Outcome variable: ability
## Grouping variable: os
##
## Descriptive statistics:
## mac pc
## mean 14.954 14.992
## std dev. 3.004 3.016
##
## Hypotheses:
## null: population means equal for both groups
## alternative: different population means in each group
##
## Test results:
## t-statistic: -2.006
## degrees of freedom: 99998
## p-value: 0.045
##
## Other information:
## two-sided 95% confidence interval: [-0.075, -0.001]
## estimated effect size (Cohen's d): 0.013
# The Cohen's d is .04. It is a very small effect.